A Fault – Tolerant System for Balancing the Load of Data – Parallel Applications
نویسندگان
چکیده
Abstract –– In distributed computing environments, fault–tolerance is an important objective, especially for parallel applications. Many distributed computing environments achieve fault–tolerance by periodic checkpointing. This has the advantage of relative ease of implementation and can be considered equivalent to task migration. However, there are two main disadvantages of such environments. One is that any work in progress after checkpointing is lost when a fault occurs. The other is that these systems are heavily reliant on task migration as the only mechanism for load balancing. This paper presents a system that overcomes these shortcomings by task duplication and by the integration of data migration into task migration as a load balancing mechanism. It also presents results of a preliminary implementation.
منابع مشابه
Parleda: a Library for Parallel Processing in Computational Geometry Applications
ParLeda is a software library that provides the basic primitives needed for parallel implementation of computational geometry applications. It can also be used in implementing a parallel application that uses geometric data structures. The parallel model that we use is based on a new heterogeneous parallel model named HBSP, which is based on BSP and is introduced here. ParLeda uses two main lib...
متن کاملFault Tolerant Scheduling for Parallel Loops on Shared Memory Systems
While multicore/multiprocessor systems achieve significant speedup for many applications by exploiting loop level parallelism, they also suffer from increased reliability problems as a result of ever scaling device size. This paper addresses the reliability of loop dominated applications, aiming to execute parallel loops efficiently in the presence of various types of hardware faults. In this p...
متن کاملApplication Recovery in Parallel Programming Environment
In this paper, fault-tolerant feature of TOPAS parallel programming environment for distributed systems is presented. TOPAS automatically analyzes data dependence among tasks and synchronizes data, which reduces the time needed for parallel program developments. TOPAS also provides supports for scheduling, load balancing and fault tolerance. The main topics of this paper is to present the solut...
متن کاملFault-Tolerant Parallel Programming with Atomic Actions
The Pact (parallel actions) parallel programming environment provides an easy-to-use parallel execution and synchronization model based on task parallelization. To give the programmer an abstraction for global data (even on distributed memory machines) the Pact runtime system uses virtual shared memory. Execution’s efficiency is improved with data-dependent dynamic load balancing and latency-ma...
متن کاملNASA Contractor Report 181938 Investigation of the Applicability of a Functional Programming Model to Fault Tolerant Parallel Processing for Knowledge-Based Systems
In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checlq3ointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault Tolerant Parallel Processor (FTPP). When used in conjunction with the FrPP's fault detection and masking capabilities, this implementation results in a graceful degradation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007